npj Genomic Medicine
○ Springer Science and Business Media LLC
All preprints, ranked by how well they match npj Genomic Medicine's content profile, based on 33 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Schönegger, D.; Montellier, E.; Blanchet, S.; Freycon, C.; Monti, P.; Goudie, C.; Bougeard, G.; Kratz, C. P.; Hainaut, P.; Reymer, A.
Show abstract
Pathogenic germline variants in the TP53 gene cause Li-Fraumeni syndrome (LFS), a highly penetrant cancer predisposition disorder. Most of these variants arise from single-nucleotide variations (SNVs) in TP53 exons, causing missense mutations. However, some of these SNVs may also alter mRNA splicing, defining spliceogenic single nucleotide variants (SE-SNVs) of uncertain clinical significance. We reassessed previously classified TP53 missense variants for spliceogenic effects using SpliceAI predictions, in vitro minigene assays, and transcriptomic data from TCGA. Genotype-phenotype correlations were evaluated using clinical data from carriers of TP53 germline variants across multiple databases and registries. Among 58 identified SE-SNVs, 40 were missense and 18 synonymous. Experimental validation showed that most induce aberrant splicing events, frequently via cryptic splice site activation, leading to frameshift and premature stop codons. Several missense variants previously classified as having mild or low pathogenicity were found instead to have strong spliceogenic effects and were associated with early-onset cancers typical of LFS, suggesting that splicing alterations may override their protein-coding impact. The frequent SNV c.375G>A leading to the synonymous variant p.T125= shows intermediate severity, likely due to partial retention of normal splicing activity. Our study highlights the underestimated pathogenic potential of SE-SNVs affecting the TP53 gene. These findings underscore the importance of integrating splicing predictions, functional assays, and transcript-level analyses into TP53 variant interpretation to improve risk stratification in LFS.
Cook, J.; Coker, T.; Card-Gowers, J.; Webber, L.
Show abstract
Fabry disease is a rare lysosomal storage condition in which sphingolipid levels build up to harmful levels in various bodily organs, eventually leading to life-threatening complications such as stroke and kidney failure. Fabry disease is caused by rare pathogenic alleles in the GLA gene on chromosome X and may present as an early or late-onset disease depending on the identity of the causal allele and the severity of its effect on the gene product. Epidemiological studies have widely varied in their estimation of Fabry disease prevalence: estimates based on reported clinical cases range from 1 in 40,000 to 1 in 170,000 individuals, whilst recent estimates based on newborn screening are much higher, ranging from 1 in 1,250 to 1 in 21,973 individuals. The primary aim of this study was to estimate the prevalence of Fabry disease in the US in 2024 by analysing selected GLA variants mostly associated with late-onset Fabry disease, projecting their allele frequencies to the US population and applying penetrance data from the literature to calculate how many causal allele carriers would be expected to be symptomatic for the disease at some point within their lifetime. 8 causal genetic variants were selected for analysis in this study based on their inclusion in a previous Fabry disease study using data from the UK Biobank. Allele frequencies for all 8 variants in global ancestry groups were extracted from gnomAD v4.1. The size and demographic makeup of the US population in 2024 was obtained from the US Census Bureau and mapped to gnomAD v4.1 ancestry groups, using previously reported estimates of the ancestral composition of Census groups encompassing multiple ancestry groups. Carrier counts by sex and ethnic group were calculated by projecting the summed allele frequencies to the US population using the Hardy-Weinberg equation and taking into consideration the X-linked mode of inheritance, assuming each individual can only carry 1 pathogenic variant. It was found that pathogenic alleles are present in the gnomAD v4.1 sample for all variants in the non-Finnish European gnomAD ancestry group, for 2 variants in South Asian ancestry group, and for 1 variant in the African / African American and East Asian ancestry groups. For the remaining 5 ancestry groups, there are no pathogenic alleles recorded in the gnomAD v4.1 dataset across all 8 variants included for analysis in the study. Results show the highest pathogenic allele carrier frequencies in the European (non-Finnish) ancestry group, followed by the South Asian, East Asian and African / African American ancestry groups. Using reported penetrance figures of 100% for males and 70% for females, it is estimated that the carrier and symptomatic populations of Fabry disease in the US in 2024, based on analysis of the 8 included variants, are 12,024 male carriers (or 1 in 14,022 males) who will all develop symptoms, and 24,845 female carriers (or 1 in 6,978 females), of whom 17,392 will develop symptoms. Of these carriers who will develop symptoms, around 98.6% (corresponding to 11,858 men and 17,153 women) will carry a variant primarily associated with late-onset or both forms of Fabry disease. The prevalence figures presented in this study are significantly higher than those based on reported clinical cases and are in line with those presented more recently based on newborn screening studies and with the prevalence reported in the UK Biobank analysis. The US National Institute of Health reports Fabry disease prevalence at around 1 in 50,000 males (which would correspond to 1 in 25,000 females). Analysing just 8 of the potentially hundreds of causal variants within the GLA gene, this study suggests that Fabry disease may be over 3 times as prevalent as is currently believed. This work highlights the vast potential of large genetic databases to analyse rare diseases, which will continue to progress as these datasets add more data, which will improve their power and diversity. What Is Already Known On This TopicO_LIFabry Disease is a rare X-linked lysosomal storage disorder with historical prevalence estimates ranging from 1 in 40,000 to 1 in 170,000 males, based on case ascertainment. C_LIO_LIMore recent newborn screening studies that test alpha-galactosidase A activity or perform genetic testing within the GLA gene, in addition to a UK Biobank study examining the prevalence of selected causal Fabry disease variants, have consistently suggested that Fabry disease may be far more prevalent than the estimates based on case ascertainment. C_LI What This Study AddsO_LITo our knowledge, this is the first study providing population-level estimates of the number of causal Fabry disease carriers and of the symptomatic population in the US using publicly available data from gnomAD v4.1. Our estimates are consistent with those produced by newborn screening studies and the UK Biobank analysis, and suggest that late-onset Fabry disease may affect >1 in 10,000 people in the US in 2024 at some point during their lifetime. C_LIO_LIThis study also demonstrates the potential of large genetic databases, such as gnomAD, for the study of rare genetic diseases, which are often misdiagnosed and may consequently be believed to be rarer than they are in reality. C_LI How This Study May Affect ResearchO_LIThis study highlights two areas for improvement which would be significantly beneficial to the study of rare genetic diseases. {circ}While this study demonstrates the utility of genetic databases to study certain rare genetic diseases, it is likely that the study of rarer conditions, in particular those manifesting during childhood and/or with a dominant mode of inheritance, would be more difficult using genetic databases, as individuals with such conditions are less likely to be included in population-level genetic biobanks (such as UK Biobank) due to a healthy volunteer bias. It is important that future genetic datasets are more representative in their recruitment to ensure that rare genetic diseases are not systematically excluded or underrepresented among participants. Studies such as All Of Us in the US, and Our Future Health and the Generation Study in the UK, will be extremely helpful in addressing this point. {circ}Estimates of the symptomatic Fabry disease population in the US in 2024 were calculated using the most up-to-date penetrance estimates in males and females. However these estimates were calculated using individuals already present in a Fabry registry and therefore may overestimate the penetrance, and especially among females, since asymptomatic carriers may be less likely to join a disease registry. Accurate calculation of the symptomatic population with a given genetic disease relies upon accurate penetrance estimates, which are not always available. These estimates are best calculated from large population-level resources with linked genetic and electronic health record data. C_LI
Velkova, I.; Cappato, S.; Rivera, D.; Romano, F.; Schonegger, D.; Bocciardi, R.; Hainaut, P.; De Marco, P.; Gismondi, V.; Cirmena, G.; Menta, L.; Ognibene, M.; Garaventa, A.; Manzitti, C.; Brugnara, S.; Ciribilli, Y.; Bisio, A.; Marcaccini, E.; Malatesta, P.; Faravelli, F.; Menichini, P.; Monti, P.; Capra, V.
Show abstract
The TP53 gene encodes the well-known P53 tumor suppressor protein, which plays a crucial role in preventing cancer development. Germline TP53 variants cause Li-Fraumeni Syndrome (LFS), an autosomal dominant disorder associated with early-onset cancers, including breast cancer, brain tumors, leukemias, bone cancers, and soft tissue sarcomas. Functional studies in yeast and human cells demonstrated that TP53 variants can have various effects, such as partial or complete loss of function and even gain of pro-oncogenic activities. Here, we identified a germline TP53 variant c.671A>C, resulting in the missense mutant protein p.E224A in the context of early-onset retroperitoneal rhabdomyosarcoma occurring in a child with a notable family history of cancer, suggestive of LFS. The variant was initially classified as a variant of uncertain significance (VUS). Functional assays in yeast and human cells demonstrated wild type-like activity of the protein p.E224A; however, in silico analysis predicted at RNA level a splicing defect, which we further investigated using a minigene approach. This analysis showed that the variant c.671A>C causes the skipping of exon 6, potentially introducing a frameshift in cDNA and a premature stop codon, which likely triggers nonsense-mediated mRNA decay; the loss of heterozygosity at the c.671 position in the parents TP53 transcript further confirmed the splicing impairment. In summary, these findings supported reclassifying the TP53 germline variant c.671A>C (p.E224A) from VUS to likely pathogenic, providing a definitive molecular diagnosis for family counseling. Additionally, this study sheds light on how certain TP53 variants that are defined as missense, can be linked to disease mechanisms through RNA splicing disruption, highlighting the need for their deep functional assessment.
Serpa, G.; Gong, Q.; De, M.; Rana, P. S. J. B.; Montgomery, C. P.; Wozniak, D. J.; Long, M. E.; Hemann, E. A.
Show abstract
Cystic fibrosis (CF) is caused by homozygous mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene, resulting in multi-organ dysfunction and decreased lifespan and quality of life. A durable cure for CF will likely require a gene therapy approach to correct CFTR. Rapid advancements in genome editing technologies such as CRISPR/Cas9 have already resulted in successful FDA approval for cell-based gene editing therapies, providing new therapeutic avenues for many rare diseases. However, immune responses to gene therapy delivery vectors and editing tools remain a challenge, especially for strategies targeting complex in vivo tissues such as the lung. Previous findings in non-CF healthy individuals reported pre-existing antibody and T cell dependent immune responses to recombinant Cas9 proteins, suggesting potential additional obstacles for incorporation of CRISPR/Cas9 technologies in gene therapies. To determine if pre-existing immunity to Cas9 from S. aureus or S. pyogenes was present or augmented in people with CF (PwCF), anti-Cas9 IgG levels and Cas9-specific T cell responses were determined from peripheral blood samples of PwCF and non-CF healthy controls. Overall, non-CF controls and PwCF displayed evidence of pre-existing antibody and T cell responses to both S. aureus and S. pyogenes Cas9, although there were no significant differences between the two populations. However, we observed global changes in activation of Th1 and CD8 T cell responses as measured by IFN{gamma} and TNF that warrant further investigation and mechanistic understanding as this finding has implications not only for CRISPR/Cas9 gene therapy for PwCF, but also for protection against infectious disease.
Scala, M.; Bradley, C. A.; Howe, J. L.; Trost, B.; Bautista Salazar, N.; Shum, C.; Reuter, M. S.; MacDonald, J. R.; Ko, S. Y.; Frankland, P. W.; Granger, L.; Anadiotis, G.; Pullano, V.; Brusco, A.; Keller, R.; Parisotto, S.; Pedro, H. F.; Lusk, L.; Pojomovsky McDonnell, P.; Helbig, I.; Mullegama, S. V.; Douine, E. D.; Russell, B. E.; Nelson, S. F.; Zara, F.; Scherer, S. W.
Show abstract
Autism Spectrum Disorder (ASD) exhibits an [~]4:1 male-to-female sex bias and is characterized by early-onset impairment of social/communication skills, restricted interests, and stereotyped behaviors. Disruption of the Xp22.11 locus has been associated with ASD in males. This locus includes the three-exon PTCHD1 gene, an adjacent multi-isoform long noncoding RNA (lncRNA) named PTCHD1-AS (spanning [~]1Mb), and a poorly characterized single-exon RNA helicase named DDX53 that is intronic to PTCHD1-AS. While the relationship between PTCHD1/PTCHD1-AS and ASD is being studied, the role of DDX53 has not been examined, in part because there is no apparent functional murine orthologue. Through clinical testing, here, we identified 6 males and 1 female with ASD from 6 unrelated families carrying rare, predicted-damaging or loss-of-function variants in DDX53. Then, we examined databases, including the Autism Speaks MSSNG and Simons Foundation Autism Research Initiative, as well as population controls. We identified 24 additional individuals with ASD harboring rare, damaging DDX53 variations, including the same variants detected in two families from the original clinical analysis. In this extended cohort of 31 participants with ASD (28 male, 3 female), we identified 25 mostly maternally-inherited variations in DDX53, including 18 missense changes, 2 truncating variants, 2 in-frame variants, 2 deletions in the 3 UTR and 1 copy number deletion. Our findings in humans support a direct link between DDX53 and ASD, which will be important in clinical genetic testing. These same autism-related findings, coupled with the observation that a functional orthologous gene is not found in mouse, may also influence the design and interpretation of murine-modelling of ASD.
Zhang, Y.; Ahsan, M. U.; Wang, K.
Show abstract
Previous genetic studies in ASD identified hundreds of high-confidence ASD genes enriched with likely deleterious protein-coding de novo mutations (DNMs). Multiple studies also demonstrated that DNMs in the non-coding genome can contribute to ASD risk. However, identification of individual risk genes enriched with noncoding DNMs has remained largely unexplored. We analyzed two datasets with over 5000 ASD families to assess the contribution of noncoding DNMs. We used two methods to assess statistical significance for noncoding DNMs: a point-based test that analyzes sites that are likely functional, and a segment-based test that analyzes 1kb genomic segments with segment-specific background mutation rates. We found that coding and noncoding DNMs in SCA2A are associated with ASD risk. Further application of these approaches on large-scale whole genome sequencing data will aid in identifying additional candidates ASD risk genes.
Hauser, B. M.; Luo, Y.; Nathan, A.; Gaiha, G. D.; Vavvas, D.; Comander, J.; Pierce, E. A.; Place, E. M.; Bujakowska, K. M.; Rossin, E. J.
Show abstract
With continued advances in gene sequencing technologies comes the need to develop better tools to understand which mutations cause disease. Here we validate structure-based network analysis (SBNA)1, 2 in well-studied human proteins and report results of using SBNA to identify critical amino acids that may cause retinal disease if subject to missense mutation. We computed SBNA scores for genes with high-quality structural data, starting with validating the method using 4 well-studied human disease-associated proteins. We then analyzed 47 inherited retinal disease (IRD) genes. We compared SBNA scores to phenotype data from the ClinVar database and found a significant difference between benign and pathogenic mutations with respect to network score. Finally, we applied this approach to 65 patients at Massachusetts Eye and Ear (MEE) who were diagnosed with IRD but for whom no genetic cause was found. Multivariable logistic regression models built using SBNA scores for IRD-associated genes successfully predicted pathogenicity of novel mutations, allowing us to identify likely causative disease variants in 37 patients with IRD from our clinic. In conclusion, SBNA can be meaningfully applied to human proteins and may help predict mutations causative of IRD.
Taylor, R. D.; Poulter, J. A.; Cockburn, J.; Ladbury, J. E.; Peckham, M.; Johnson, C. A.
Show abstract
Primary ciliopathies are a group of inherited developmental disorders resulting from defects in the primary cilium. Mutations in CEP290 (Centrosomal protein of 290kDa) are the most frequent cause of recessive ciliopathies (incidence up to 1:15,000). Pathogenic variants span the full length of this large (93.2kb) 54 exon gene, causing phenotypes ranging from isolated inherited retinal dystrophies (IRDs; Leber Congenital Amaurosis, LCA) to a pleiotropic range of severe syndromic multi-organ ciliopathies affecting retina, kidney and brain. Most pathogenic CEP290 variants are predicted null (37% nonsense, 42% frameshift), but there is no clear genotype-phenotype association. Almost half (26/53) of the coding exons in CEP290 are in-phase "skiptic" (or skippable) exons. Variants located in skiptic exons could be removed from CEP290 transcripts by skipping the exon, and nonsense-associated altered splicing (NAS) has been proposed as a mechanism that attenuates the pathogenicity of nonsense or frameshift CEP290 variants. Here, we have used in silico bioinformatic techniques to study the propensity of CEP290 skiptic exons for NAS. We then used CRISPR-Cas9 technology to model CEP290 frameshift mutations in induced pluripotent stem cells (iPSCs) and analysed their effects on splicing and ciliogenesis. We identified exon 36, a hotspot for LCA mutations, as a strong candidate for NAS that we confirmed in mutant iPSCs that exhibited sequence-specific exon skipping. Exon 36 skipping did not affect ciliogenesis, in contrast to a larger frameshift mutant that significantly decreased cilia size and incidence in iPSCs. We suggest that sequence-specific NAS provides the molecular basis of genetic pleiotropy for CEP290-related disorders.
Shil, A.; Arava, N.; Levi, N.; Levine, L.; Golan, H.; Meiri, G.; Michaelovski, A.; Tsadaka, Y.; Aran, A.; Menashe, I.
Show abstract
BackgroundDiscerning clinically relevant ASD candidate variants from whole-exome sequencing (WES) data is complex, time-consuming, and labor-intensive. To this end, we developed AutScore, an integrative prioritization algorithm of ASD candidate variants from WES data, and assessed its performance to detect clinically relevant variants. MethodsWe studied WES data from 581 ASD probands, and their parents registered in the Azrieli National Center database for Autism and Neurodevelopment Research. We focused on rare allele frequency <1%), high-quality proband-specific variants affecting genes associated with ASD or other neurodevelopmental disorders (NDDs). We assigned a score (i.e., AutScore) to each such variant based on their pathogenicity, clinical relevance, gene-disease association, and inheritance patterns. Finally, we compared the AutScore performance with the rating of clinical experts and the NDD variants prioritization algorithm, AutoCasC. ResultsOverall, 1161 ultra-rare variants distributed in 687 genes in 441 ASD probands were evaluated by AutScore with scores ranging from -4 to 25, with a mean {+/-} SD of 5.89 {+/-} 4.18. AutScore cut-off of [≥] 12 outperforms AutoCasC in detecting clinically relevant ASD variants, with a detection accuracy rate of 72.3% and an overall diagnostic yield of 11.9%. Sixteen variants with AutScore of [≥] 12 were distributed in fifteen novel ASD genes. ConclusionAutScore is an effective automated ranking system for ASD candidate variants that could be implemented in ASD clinical genetics pipelines.
Brunfeldt, M.; Vrijenhoek, T.; Kaariainen, H.
Show abstract
To study European biobanks policies, practices, and experiences on communicating individual research results to participants the EU Horizon 2020 Project Genetics Clinic of the Future performed two surveys in 2016 and 2020. First, a questionnaire was sent in 2016 (Survey I) to 351 European biobanks in 13 countries that were members of Biobanking and Biomolecular Resources Research Infrastructure - European Research Infrastructure Consortium (BBMRI-ERIC). We received replies from 72 biobanks (response rate 21%), representing each of the 13 BBMRI Member States. Respondents were mainly directors or heads of biobanks. To evaluate how the policies and practices of biobanks evolved over time, we also conducted another survey in 2020 (Survey II). The Survey I was implemented using a web based Webropol tool, and the Survey II was distributed by email. The biobanks had very different policies of sharing genomic data and the policies had changed over time. The percentage of biobanks with a policy to share results with participants if they so wish had increased between 2016-2020 from 36% to 45%. On the contrary, the percentage of biobanks with a policy to pro-actively re-contact the participants to share (some) results had decreased from 52% to 39%. Still in 2020, half of the biobanks had never shared results with participants.
Corredor, J. L.; Dodd-Eaton, E. B.; Woodman-Ross, J.; Woodson, A.; Nguyen, N. H.; Peng, G.; Green, S.; Gutierrez, A. M.; Arun, B. K.; Wang, W.
Show abstract
Genetic counseling and testing for germline mutations are essential for identifying individuals at increased risk for cancer. Pathogenic variants in TP53 are diagnostic of Li-Fraumeni syndrome (LFS), a highly penetrant disorder with diverse, early-onset tumors. Current clinical guidelines, such as Chompret and Classic criteria, provide frameworks for identifying individuals at risk for likely pathogenic/pathogenic TP53 variants; however, genetic counselors often encounter patients with features concerning for LFS that do not clearly meet established criteria, creating challenges for risk assessment and testing decisions. We evaluated whether LFSPRO, a Mendelian, family-history-based model that estimates the individuals probability of harboring a deleterious TP53 variant, improves carrier identification relative to guideline criteria. In a prospectively collected cohort of 182 probands who underwent clinical genetic counseling and germline TP53 testing, LFSPRO showed superior discrimination compared with Chompret criteria, with higher sensitivity (81% vs. 33%) and specificity (88% vs. 65%) and improved predictive values (PPV 0.53 vs. 0.14; NPV 0.96 vs. 0.85). Receiver operating characteristic analysis confirmed strong discriminatory performance (AUC=0.88). Calibration analysis using observed-to-expected ratios indicated good agreement between predicted and observed carrier frequencies (Observed/Expected=1.07). These findings demonstrate that LFSPRO outperforms traditional guideline-based criteria for identifying TP53 mutation carriers in real-world clinical settings. By providing quantitative, well-calibrated carrier probabilities rather than binary classifications, LFSPRO can enhance genetic counseling and support testing decisions, particularly for individuals who do not clearly meet existing criteria.
Mukhopadhyay, N.; Feingold, E. E.; Brand, H.; Lee, M. K.; Kurtas, E. N.; Sanchis-Juan, A.; Moreno-Uribe, L.; Wehby, G.; Valencia-Ramirez, L. C.; Restrepo Muneton, C. P.; Padilla, C.; Deleyiannis, F.; Poletta, F. A.; Orioli, I. M.; Hecht, J. T.; Buxo, C. J.; Butali, A.; Adeyemo, W. L.; Abebe, M. E.; Vieira, A. R.; Shaffer, J. R.; Murray, J. C.; Weinberg, S. M.; Ruczinski, I.; Leslie-Clarkson, E. J.; Marazita, M. L.
Show abstract
ObjectiveOur understanding of the genetic causes of non-syndromic orofacial clefts (OFCs) is based largely upon genetic studies of common and rare nucleotide variants. Less is known about the role of copy number variations (CNVs) and the studies published to date have been limited to either small samples or targeted genomic regions. The objective of our study is to investigate the contribution of CNVs spread across the entire genome to OFC risk in a large multi-ancestry cohort. MethodsWe utilized PennCNV on microarray genotyping data to detect CNVs in 10,240 participants (2,484 with clefts, 7,756 unaffected). 70,695 quality-filtered autosomal CNVs (49,660 deletions, 21,035 duplications) were used to assign normal/abnormal copy number statuses at 67,199 positions from the GRCh37 genome assembly. Genome-wide association was run between cleft status and copy number status. ResultsWe observed a highly significant association between OFCs and deletions on chromosome 7p14.1 (p=1.32e-35) driven by Central and South American ancestry (p=1.04e-25) participants, with less significant contributions from European (p=3.37e-08) and Asian (p=0.01) ancestry participants. We also observed four other loci with p-values below 10e-04. ConclusionThe 7p14.1 association observed in our study is a replication of two prior studies in independent cohorts of European ancestry. However, this locus lies in a T-cell receptor region that is subject to somatic rearrangements that decrease in frequency with age and may affect genetic association results. Our data show age effects as well as differences between blood and saliva samples. Thus, our results can be interpreted either as supporting a previously established association with orofacial clefts, or as questioning those previous results in favor of a hypothesis about the behavior of somatic rearrangements in T-cell receptor regions.
LeMaster, C.; Schwendinger-Schreck, C.; Ge, B.; Cheung, W.; Johnston, J.; Pastinen, T.; Smail, C.
Show abstract
Recent studies have revealed the pervasive landscape of rare structural variants (rSVs) present in human genomes. rSVs can have extreme effects on the expression of proximal genes and, in a rare disease context, have been implicated in patient cases where no diagnostic single nucleotide variant (SNV) was found. Approaches for integrating rSVs to date have focused on targeted approaches in known Mendelian rare disease genes. This approach is intractable for rare diseases with many causal loci or patients with complex, multi-phenotype syndromes. We hypothesized that integrating trait-relevant polygenic scores (PGS) would provide a substantial reduction in the number of candidate disease genes in which to assess rSV effects. We further implemented a method for ranking PGS genes to define a set of core/key genes where a rSV has the potential to exert relatively larger effects on disease risk. Among a subset of patients enrolled in the Genomic Answers for Kids (GA4K) rare disease program (N=497), we used PacBio HiFi long-read whole genome sequencing (lrWGS) to identify rSVs intersecting genes in trait-relevant PGSs. Illustrating our approach in Autism (N=54 cases), we identified 22,019 deletions, 2,041 duplications, 87,826 insertions, and 214 inversions overlapping putative core/key PGS genes. Additionally, by integrating genomic constraint annotations from gnomAD, we observed that rare duplications overlapping putative core/key PGS genes were frequently in higher constraint regions compared to controls (P = 1x10-03). This difference was not observed in the lowest-ranked gene set (P = 0.15). Overall, our study provides a framework for the annotation of long-read rSVs from lrWGS data and prioritization of disease-linked genomic regions for downstream functional validation of rSV impacts. To enable reuse by other researchers, we have made SV allele frequencies and gene associations freely available.
Bhandarkar, A. A.; Kelly-Foleni, N. E.; Sarkar, D.; Jeffs, A.; Slatter, T.; Braithwaite, A.; Mehta, S.
Show abstract
TP53 undergoes alternative splicing to produce multiple mRNA transcripts and protein isoforms, yet the effects of splice site mutations on isoform regulation, tumor-biology, and clinical outcome remain unclear. Analysis of 23,017 TP53 variants, including 18,562 somatic mutations (pan-cancer datasets - cBioPortal) and 4,455 germline variants (IARC database), identified recurrent donor (X32, X125, X224, X261, X331) and acceptor (X33, X126, X187, X225, X307, X332) splice site mutations. Germline variants showed nucleotide-specific transition biases. Most splice site mutations were associated with reduced TP53 mRNA expression; however, X32, X33, X126, and X261 maintained or elevated transcript levels. Splice mutations were associated with distinct transcriptional subsets marked by altered p53 target gene expression, elevated tumor mutation burden, increased genomic instability, and significantly reduced disease-free survival compared to missense mutations, with X126 and X331 being associated with poorest outcomes. These findings emphasize the clinical impact of TP53 splice site mutations and the need for functional classification.
Engchuan, W.; Han, K.; Feitosa, R. M.; Salazar, N. B.; Mager, D. J.; Wu, S.; Ali, F.; Chan, A.; Mendes de Aquino, M.; Zhou, X.; Shaath, R.; Safarian, N.; Thiruvahindrapuram, B.; Nalpathamkalam, T.; Pellecchia, G.; de Rijke, J.; Zarrei, M.; Breetvelt, E.; Scherer, S. W.; Trost, B.; Vorstman, J.
Show abstract
Compound heterozygous events involving a chromosome deletion and on the remaining allele a functional DNA sequence-level variant can underpin a range of medical conditions. Most large-scale genetic studies do not include a systematic analysis of such compound heterozygous deletion (DelCH) events. We developed three frameworks: i) traditional burden analysis; ii) deletion-matched burden analysis; and iii) transmission disequilibrium test (TDT), to examine the possible contribution of DelCH to clinical presentations, and report results of their implementation in 9,766 families of autistic individuals. Across the three strategies, we observed enrichment of rare DelCH events in autistic individuals at a nominal significance level for individual tests. Collectively, six genes; CFHR4, HSDL1, MYO15A, NEFH, and three olfactory receptor genes; OR1A2, OR4P2, were affected by DelCH events in at least two unrelated autistic individuals (and not in unaffected family members), while the reverse analyses identified no genes (p<2.2 x 10-16). Gene set enrichment analysis of the extended network of candidate genes showing a remarkable convergence to processes related to neurogenesis. Our findings suggest a modest role for DelCH events in ASD. The strategies described here are available via a GitHub repository, allowing the research community to examine the role of DelCH in other genome sequencing cohorts.
Canson, D. M.; Llinares-Burguet, I.; Fortuno, C.; Sanoguera-Miralles, L.; Bueno-Martinez, E.; de la Hoya, M.; Spurdle, A. B.; Velasco-Sampedro, E. A.
Show abstract
Germline TP53 genetic variants that disrupt splicing are implicated in hereditary cancer predisposition, while somatic variants contribute to tumorigenesis. We investigated the role of TP53 splicing regulatory elements (SREs), including G-runs that act as intronic splicing enhancers, using exons 3 and 6 and their downstream introns as models. Minigene microdeletion assays revealed four SRE-rich intervals: c.573_598, c.618_641, c.653_669 and c.672+14_672+36. A diagnostically reported deletion c.655_670del, overlapping an SRE-rich interval, induced an in-frame transcript {Delta}(E6q21) from new donor site usage. Within intron 6, deletion of at least four G-runs led to 100% aberrant transcript expression. Additionally, assay results suggested a donor-to-branchpoint distance cutoff of <50 nt for complete splicing aberration due to spatial constraint, and >75 nt for low risk of splicing abnormality. Overall, splicing data for 134 single nucleotide variants (SNVs) and 27 deletions in TP53 demonstrated that SRE-disrupting SNVs have weak splicing impact (up to 26% exon skipping), while deletions spanning multiple SREs can have profound splicing effects. Results also provide more data to inform splicing impact prediction for intronic deletions that shorten intron size.
Artaza, H.; Priyanka, D.; Molnes, J.; Lavrichenko, K.; Wolff, A. S. B.; Royrvik, E. C.; Skrivarhaug, T.; Vaudel, M.; Bratland, E.; Johansson, B.; Njolstad, P. R.; Johansson, S.
Show abstract
Technological advancements have significantly improved our understanding of Copy Number Variants (CNVs) and their role in disease. However, detecting CNVs in clinical diagnostics remains challenging, and important pathogenic CNVs may go undetected. This study systematically assessed the impact of rare, large, high-penetrance CNVs on pediatric diabetes and Maturity-onset Diabetes of the Young (MODY) in Norway. We analyzed data from the nationwide Norwegian Childhood Diabetes Registry (NCDR) covering 2002-2018 and the Norwegian MODY Registry (NMR) from 1997-2019. CNV detection was performed using the Illumina Infinium Global Screen Array-24 v2.0 on a total of 5,889 individuals and we compared the results to diagnostic records. Our findings indicate that 0.63% of the patients in the Norwegian MODY Registry and 0.09% in the Norwegian Childhood Diabetes Registry are attributable to established pathogenic large copy number deletions detectable by array genotyping. Notably, six of the 14 pathogenic deletions identified (in the HNF1B [n=3], HNF1A [n=2], or GATA4 genes [n=1]) had not been detected through standard diagnostic methods in the routine diagnostic screening. For these individuals, accurate molecular diagnoses have significant implications for personalized treatment and follow-up. We found no evidence suggesting a major role for additional rare CNVs beyond the already established pathogenic CNVs in MODY. In conclusion, while pathogenic CNVs are rare, they remain relevant for patients of the Norwegian nationwide diabetes registries. Expanding screening for MODY variants, specifically 17q12-HNF1B and HNF1A deletions, to a larger portion of the pediatric diabetes population should be considered.
Engal, E.; Oja, K. T.; Maroofian, R.; Geminder, O.; Le, T.-L.; Mor, E.; Tzvi, N.; Elefant, N.; Zaki, M. S.; Gleeson, J. G.; Muru, K.; Pajusalu, S.; Wojcik, M. H.; Pachat, D.; Abd Elmaksoud, M.; Jeong, W. C.; Lee, H.; Bauer, P.; Zifarelli, G.; Houlden, H.; Elpeleg, O.; Gordon, C.; Harel, T.; Ounap, K.; Salton, M.; Mor-Shaked, H.
Show abstract
Over two dozen spliceosome proteins are involved in human diseases, also referred to as spliceosomopathies. WBP4 (WW Domain Binding Protein 4) is part of the early spliceosomal complex, and was not described before in the context of human pathologies. Ascertained through GeneMatcher we identified eleven patients from eight families, with a severe neurodevelopmental syndrome with variable manifestations. Clinical manifestations included hypotonia, global developmental delay, severe intellectual disability, brain abnormalities, musculoskeletal and gastrointestinal abnormalities. Genetic analysis revealed overall five different homozygous loss-of-function variants in WBP4. Immunoblotting on fibroblasts from two affected individuals with different genetic variants demonstrated complete loss of protein, and RNA sequencing analysis uncovered shared abnormal splicing patterns, including enrichment for abnormalities of the nervous system and musculoskeletal system genes, suggesting that the overlapping differentially spliced genes are related to the common phenotypes of the probands. We conclude that biallelic variants in WBP4 cause a spliceosomopathy. Further functional studies are called for better understanding of the mechanism of pathogenicity.
Lee, K. A. V.; Whitman, M. C.
Show abstract
ObjectiveTo identify rare and common CNVs associated with strabismus and amblyopia and to determine whether these variants reveal overlapping genetic mechanisms between the two disorders. DesignCase-control association study using structural variant calls from short-read whole- genome sequencing. Subject, participants and controls1,141 adults with strabismus, 566 with amblyopia (157 with both), and controls (95,806 for strabismus; 96,381 for amblyopia) enrolled in the All of Us Research Program and with available structural variant calls. MethodsCNVs were called using the GATK-SV pipeline from short-read whole genome sequencing (srWGS). After instituting a variety of quality control measures, including requiring two types of evidence and being identified by two different calling algorithms, CNVs present in 20 or more affected individuals were divided into rare (<1% population frequency) and common (>1% population frequency). The rates of each CNV were compared between affected individuals and controls. Significant CNVs were manually verified in IGV. Functional effects were annotated using Varient Effect Predictor from Ensembl. Main Outcome MeasuresOdds ratios for CNV carrier status in cases versus controls, adjusted for multiple testing (Benjamini-Hochberg FDR < 0.05); functional annotation, dosage sensitivity, regulatory element overlap, and pathway enrichment. ResultsFourteen rare and 29 common CNVs were significantly associated with strabismus; 1 rare and 2 common CNVs were associated with amblyopia (45 unique CNVs total). The rare CNV associated with amblyopia is an intronic deletion in MDGA2. Two common intronic deletions (GRIN2B; CACNA1B) were associated with both conditions and highly predictive of comorbid strabismus and amblyopia (47% comorbidity when both present, p < .001). Implicated genes predominantly affect synapse formation and function (e.g., CSMD1, GRIN2B, CACNA1B, RIMS1), neuronal migration (e.g., TUBB, EML1), and neurodevelopment; 64% have known neurodevelopmental phenotypes and 27% have been linked to mental health disorders. ConclusionsCNVs highlight synaptic and neurodevelopmental pathways as central to strabismus and amblyopia etiology and provide the first evidence of shared genetic risk. The combination of GRIN2B and CACNA1B deletions identify strabismus patients at high risk for amblyopia.
Mann, T.; Smith, A.; Spencer, S.; Thaventhiran, J.; Russell, A.
Show abstract
The functional validation of genetic variants of uncertain significance (VUS) found in PID patients by next-generation sequencing has traditionally been carried out in model systems that are susceptible to artefact. We use CRISPR correction of primary human T lymphocytes to demonstrate that a specific variant in an IL-6R deficient patient is causative for their condition. This methodology can be adapted and used for variant assessment of the heterogeneous genetic defects that affect T lymphocytes in PID.